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Abstract 



Regularization approaches have demonstrated their effectiveness for solving ill-posed prob- 
lems. However, in the context of variational restoration methods, a challenging question re- 
mains, namely how to find a good regularizer. While total variation introduces staircase effects, 
wavelet domain regularization brings other artefacts, e.g. ringing. However, a trade-off can be 
made by introducing a hybrid regularization including several terms non necessarily acting in 
the same domain (e.g. spatial and wavelet transform domains). While this approach was shown 
to provide good results for solving deconvolution problems in the presence of additive Gaussian 
noise, an important issue is to efficiently deal with this hybrid regularization for more general 
noise models. To solve this problem, we adopt a convex optimization framework where the 
criterion to be minimized is split in the sum of more than two terms. For spatial domain reg- 
ularization, isotropic or anisotropic total variation definitions using various gradient filters are 
considered. An accelerated version of the Parallel Proximal Algorithm is proposed to perform 
the minimization. Some difficulties in the computation of the proximity operators involved in 
this algorithm are also addressed in this paper. Numerical experiments performed in the con- 
text of Poisson data recovery, show the good behaviour of the algorithm as well as promising 
results concerning the use of hybrid regularization techniques. 
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1 Introduction 



During the last decades, convex optimization methods have been shown to be very effective for 
solving inverse problems. On the one hand, algorithms such as Projection Onto Convex Sets 
(POCS) [21 El E] have become popular for finding a solution in the intersection of convex 
sets. POCS was used in data recovery problems [7] in order to incorporate prior information on 
the target image (e.g. smoothness constraints). Some variants of POCS such as ART (Algebraic 
Reconstruction Technique) [8] or PPM (Parallel Projection Method) [9j [10] were also proposed 
to achieve iteration parallelization. Additional variants of POCS can be found in [11 . Other 
parallel approaches such as block-iterative surrogate constraint splitting methods were considered 
to solve a quadratic minimization problem under convex constraints |12| which may include a total 
variation constraint (see also |13j). However the method in [12] based on subgradient projections 
is not applicable to non-differentiable objective functions. 

On the other hand, some denoising approaches were based on wavelet transforms [14] . and more 
generally on frame representations [HI [HI [T71 EH] . In pjJl [2Ql [2D [22] > algorithms which belong to 
the class of forward-backward algorithms were proposed in order to restore images degraded by 
a linear operator and a noise perturbation. Forward-backward iterations allow us to minimize a 
sum of two functions assumed to be in the class Tq(T-L) of lower semicontinuous convex functions 
defined on a Hilbert space H, and taking their values in ] — oo, +oo], which are proper (i.e. not 
identically equal to +oo). In addition, one of these functions must be Lipschitz differentiable on 
H. In [23] . this algorithm was investigated by making use of proximity operator tools [23] firstly 
proposed by Moreau in [25J. In [26J, applications to frame representations were developed and a 
list of closed form expressions of several proximity operators was provided. Typically, forward- 
backward methods are appropriate when dealing with a smooth data fidelity term e.g. a quadratic 
function and a non-smooth penalty term such as an ^i-norm promoting sparsity in the considered 
frame |27[ 128] . The computation of the proximity operator associated with the ^i-norm indeed 
reduces to a componentwise soft-thresholding [29 \ [30] . Another optimization method known as the 
Douglas- Rachford algorithm [3H [321 E21 133] was then proposed for signal/image recovery problems 
[34] to relax the Lipschitz differentiablity condition required in forward-backward iterations. In 
turn, the latter algorithm requires the knowledge of the proximity operators of both functions. 
This algorithm was then extended to the minimization of a sum of a finite number of convex 
functions [35], the proximity operator of each function still being assumed to be known. One of 
the main advantages of this algorithm called Parallel ProXimal Algorithm (PPXA) is its parallel 
structure which makes it easily implementable on multicore architectures. PPXA is well suited to 
deconvolution problems in the presence of additive Gaussian noise, where the proximity operator 
associated with the fidelity term takes a closed form [35]. To minimize a sum of two functions 
one of which is quadratic, another interesting class of parallel convex optimization algorithms 
was proposed by Fornasier et al. in |36[ 137] . However, in a more general context, particularly 
when the noise is not additive Gaussian and a wider class of degradation operators is considered, 
the proximity operator associated with data fidelity term does not have a closed form, which 
prevents the direct use of PPXA and the algorithms in [36J and |37j . Therefore, other solutions 
have to be looked for. For Poisson noise, a first solution is to resort to the Anscombe transform 
[38] . while a second one consists of approximating the Poisson data fidelity term with a gradient 
Lipschitz function [39] . Both approaches require the use of a nested iterative algorithm [38], [39] , 
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combining forward-backward and Douglas- Rachford iterations. Nested algorithms may however 
appear limited for two main reasons: the parallelization of the related iterations is difficult, and 
the number of functions to be minimized is in practice limited to three. More recently, approaches 
related to augmented Lagrangian techniques [1QJ [4_T] have been considered in [351 H31 US S5] • These 
methods are well-adapted when the linear operator is a convolution and Fourier diagonalization 
techniques can thus be used, but for more general linear degradation operators, a large-size linear 
system of equations has to be solved numerically at each iteration of the algorithm. 

The objective of this paper is to propose an adaptation of PPXA to minimize criteria used in a 
wide panel of restoration problems such as those involving a convolution or decimated convolution 
operator using a finite-support kernel and non-necessarily additive Gaussian noise. Decimated con- 
volutions are important in practice since they are often encountered in super-resolution problems. 
To apply proximal methods, it seems that we should be able to compute the proximity operator 
associated with the fidelity term for a large class of noise distributions. When the proximity oper- 
ator cannot be easily computed, we will show that a splitting approach may often be employed to 
circumvent this difficulty. This is the main contribution of this paper. 

Moreover, based on similar splitting techniques and in the spirit of existing works on decon- 
volution in the presence of Gaussian noise [STJ [35J, [46j [47] , a twofold regularization composed of a 
sparsity term and a total variation term is performed in order to benefit from each regularization. 
We will consider this type of hybrid regularization by investigating different discrete forms of the 
total variation. 

The paper is organized as follows: first, in Section [2j we present the considered restoration 
problem and the general form of the associated criterion to be minimized. Then, in Section [3j the 
definition and some properties of proximity operators as well as explicit forms related to the data 
fidelity term in a restoration context and to a discretization of the total variation are provided. 
Section introduces an accelerated version of PPXA which allows us to efficiently solve frame- 
based image recovery problems. Section 15.11 shows how the results obtained in the two previous 
sections can be used for solving restoration problems where a regularization is performed both in 
the spatial and in the wavelet domains. Finally, in Section 15. 2[ the effectiveness of the proposed 
approach is demonstrated by experiments for the restoration of images degraded by a blur (or a 
decimated blur) with finite-support kernel and by a Poisson noise. Some conclusions are drawn in 
Section [H 

2 Background 

2.1 Image restoration 

The degradation model considered throughout this paper is the following: 

z = V a (Ty) (1) 

where y denotes the original image of global size N degraded by a non-negative valued convolutive 
operator T : M. N — > R M and contaminated by a noise non necessarily additive, the effect of which 
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is denoted by T> a . Here, a is a positive parameter which characterizes the noise intensity. The 
vector z G M M represents the observed data of size M. For example, D a may denote the addition 
of a zero-mean Laplacian noise with standard-deviation a, or the corruption by an independent 
Poisson noise with scaling parameter a. T represents a convolution or a decimated convolution 
operator using a finite-support kernel. 

Our objective is to recover the image y from the observation z by using some prior information 
on its frame coefficients and its spatial properties. 

2.2 Frame representation 

In inverse problems, certain physical properties of the target solution y are most suitably expressed 
in terms of the coefficients x = (x^)i<k<K £ ^ K of its representation y = Ylk=l^^ e k with 
respect to a family of vectors (ek)\<k<K i n the Euclidean space M. N . Recall that a family of 
vectors (ek)i<k<K in constitutes a frame if there exist two constants y_ and V in ]0, +oo[ such 
thatH 

K 

(Vyem N ) e\\v\\ 2 <Y,\ e ^y\ 2 ^ v \\y\\ 2 - ( 2 ) 

The associated frame operator is the injective linear operator F: — > W K : y i— > (e^ 1 y)\<k<Ki 
the adjoint of which is the surjective linear operator F T : W K — > W N : (a;^)i<fc<K J2k=i x^et- 
When v_ = V = v in ([2]), [&k)\<k<K is sa id to be a tight frame. In this case, we have 

F T F = uld, (3) 

where Id denotes the identity matrix. A simple example of a tight frame is the union of v or- 
thonormal bases, in which case u = V = v. For instance, a 2D real (resp. complex) dual-tree 
wavelet decomposition is the union of two (resp. four) orthonormal wavelet bases [18]. Curvelets 
[15] constitute another example of tight frame. Historically, Gabor frames |48| have played an 
important role in many inverse problems. Under some conditions, contourlets [19] also constitute 
tight frames. When F~ x = F T , an orthonormal basis is obtained. Further constructions as well 
as a detailed account of frame theory in Hilbert spaces can be found in [50] . 

In such a framework, the observation model becomes 

z = V a (TF T x) (4) 

where x represents the frame coefficients of the original data (y = F T x £ M. N is the target data of 
size N). Our objective is now to recover x from the observation z. 

1 In finite dimension, the upper bound condition is always satisfied. 
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2.3 Minimization problem 



In the context of inverse problems, the original image can be restored by solving a convex opti- 
mization problem of the form: 

J 

Find x G Argmin 2^fj{ x ) (5) 

x£R K - =1 

where (fj)i<j<j are functions of r (R x ) (see |35| and references therein) and the restored image 
is y = F T x. 

A particular popular case is when J = 2; the minimization problem thus reduces to the 
minimization of the sum of two functions which, under a Bayesian framework, can be interpreted 
as a fidelity term fi linked to noise and an a priori term related to some prior probabilistic 
model put on the frame coefficients (some examples will be given in Section [5]). 

In this paper, we are especially interested in the case when J > 2, which may be fruitful for 
imposing additional constraints on the target solution. At the same time, when considering a frame 
representation (which, as already mentioned, often allows us to better express some properties of 
the target solution), the convex optimization problem ([5]) can be re-expressed as: 

s J 

Find x G Argmin gj(F T x) + fj( x ) (6) 

xeRK j=1 " j=s+1 J 

where {gj)i<j<s are functions of Tq(M. ) and (fj)s+i<j<J are functions of ro(M^), related to the 
image or to the frame coefficients, respectively. The terms for j G {1, . . . , S} related directly to the 
pixel values may be the data fidelity term, or a pixel range constraint term, whereas, the functions 
of indices j G {S+ 1, . . . , J} defined on frame coefficients are often chosen from some classical prior 
probabilistic model. For example, they may correspond to the minus log-likelihood of independent 
variables following generalized Gaussian distributions [51] . 

We will now present convex analysis tools which are useful to deal with such minimization 
problems. 

3 Proximal tools 

3.1 Definition and examples 

A fundamental tool which has been widely employed in the recent convex optimization literature 
is the proximity operator |24j first introduced by Moreau in 1962 [52], [25]. The proximity operator 
of if G ro(M x ) is defined as 

prox : M. x — > M. x : u \-t arg min - \\v — u\\ 2 + tp(v). (7) 
^ vGR x 2 
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Thus, if C is a nonempty closed convex set of H. , and ic denotes the indicator function of C, 
i.e., Vu 6 R x , = if u € C, +00 otherwise, then, prox tc reduces to the projection Pc onto 

C. Other examples of proximity operators corresponding to the potential functions of standard 
log-concave univariate probability densities have been listed in [231 123 E5]- Some of them will be 
used in the paper and we will thus recall the proximity operators of the potentials associated with 
a Gamma distribution (which is closely related to the Kullback-Leibler divergence |53] ) and with 
a generalized Gaussian distribution, before dealing with the Euclidean norm in dimension 2. 



Example 3.1 



Let a > and set 



ip: 



00, +00] 

-Xln(?/) + ar], if x > and 77 > 0; 

77 1->- < arj, if x = and 77 > 0; 

^+oo, otherwise. 



Then, for every 77 £ M, 



prox 77 



77 — a + x/l 7 ? _ a l 2 + 4 X 
V/ - 2 " 

Example 3.2 Let y > 0. p € [1. +00 [, and set 

ip: E ->■ ]-oo, +00] : 77 h-> xM P - 
Then, for every 77 £ M, prox^T? is given by 

sign(77) max{|77| — 0} if p = 1 

+ F^73 ((e - t/ ^-^ + t/) 1 / 3 ) 

where e = yjrj 2 + 256x 3 /729 if p = | 

Qx^ignfa) f 1 _J 1 + A%H ifp= 3 



7? + 

1+2X 

signfa)* 



MM 

9X ; 



if p = 2 
if p = 3 



(8) 



(9) 



(10) 



(11) 



where sign denotes the signum function. In Example 13.21 it can be noticed that the proximity 
operator associated with p = 1 reduces to a soft thresholding. 

Example 3.3 [35] Let \i > and set 

92: R 2 ^R: (7/1,7/2) ^/VM 2 + M 2 - (12) 

Then, for every (771,772) £ M 2 , 

x f(i- /, iL jf Wqa), if VWTW> a*; 

prox^ (771,772) = < V Vlml a +I»p[ 3 / (13) 
I (0, 0), otherwise. 
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3.2 Proximity operators involving a linear operator 



We will now study the problem of determining the proximity operator of a function j = $oT 
where T : R N -> R M is a linear operator, 

M 

^RM^J^^^.^H)^^ ^Vm(« (m) ) (14) 

m=l 

and, for every m € {1, . . . , M}, ^ m G ro(M). As will be shown next, the proximity operator of this 
function can be determined in a closed form for specific cases only. However, g can be decomposed 
as a sum of functions for which the proximity operators can be calculated explicitly. Firstly, we 
introduce a property concerning the determination of the proximity operator of the composition 
of a convex function and a linear operator, which constitutes a generalization of [344 Proposition 
11] for separable convex functions. The proof of the following proposition is provided in Appendix 

m 

Proposition 3.4 Let X G N*, Y G N* ; and let (o m )i< m <y be an orthonormal basis of R Y . Let 
T be a function such that 

Y 

(Vu G R Y ) T(u) = £ i> m (o m T u) (15) 

m=l 

where {ip m )i<m<Y ar e functions in Fo(M). Let L be a matrix in M. YxX such that 

Y 

LL7 = A m o m0 T (16) 

D m=l 

where (A m )i< m <y is a sequence of positive reals. 
Then T o L G Ti^lR^) and, for every v G M. x 

prox ToL v = v + L T D^ 1 (prox DT (Lt> ) — Lv) (17) 

where DT is the function defined by 

Y 

(Vu G M y ) DT(u) = A m ^ m (o m T u). (18) 

m=l 



The function ^ defined in (j!4j) is separable in the canonical basis of R M . However, for an 
abitrary convolutive (or decimated convolutive) operator L = T, (|16|) is generally not satisfied. 
Nevertheless, assume that (h)i<i<i is a partition of {1, . . . ,M} in nonempty sets. For every 
i G {1, ...,/}, let Mi be the number of elements in Ij (X[ =1 M i = M ) and let T « : rMi ~> 
]0,+oo[ : (u^) m&t i ^ Emei^m^)- If, for every < G {1,...,M}, ^ is the vector of R N 
corresponding to the i-th row vector of T, we have then g = Yli=i ° where T, is a linear 
operator from to R Mi associated with a matrix 

[V ••• Wj T (19) 

and I, = {mi, . . . , m^}- The following assumption will play a prominent role in the rest of the 
paper: 
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Assumption 3.5 For every i 6 {1, . . . , /}, (i m ) m gi. is a family of non zero orthogonal vectors. 

Then, g can be decomposed as a sum of / functions (Tj o Tj)i<j</ where, for every i S {1, . . . , /}, 
Di = TiT^ is associated with an invertible diagonal matrix Diag(Aj j i, . . . , A^mJ- According to 
Proposition 13.41 we have then, for every y 6 W N , 

prox x . oT .y = y + T? Dr 1 (prox D . x . (T iy ) - T iy ) . (20) 



Remark 3.6 1. Note that Assumption 13.51 is obviously satisfied when I = M, that is when, 
for every i £ {1, ...,/}, Ij reduces to a singleton. 

2. It can be noticed that the application of Tj or reduces to standard operations in signal 
processing. For example, when T corresponds to a convolutive operator, the application of Tj 
consists of two steps: a convolution with the impulse response of the degradation filter and a 
decimation for selected locations (m G I4). The application of Tj also consists of two steps: 
an interpolation step (by inserting zeros between data values of indices m £ Ij) followed by 
a convolution with the filter with conjugate frequency response. 

The fundamental idea behind the previously introduced partition (Ij)i<i<j, is to form groups 
of non-overlapping - and thus orthogonal - shifts of the convolution kernel so as to be able to 
compute the corresponding proximity operators. To reduce the number of proximity operators to 
be computed, one usually wants to find the smallest integer / such that, for every i £ {1, . . . ,/}, 
{irajm&i 1S an orthogonal family. For the sake of simplicity, we will consider the case of a ID 
deconvolution problem, where iV represents the original signal size whereas M corresponds to the 
degraded signal size, the extension to 2D deconvolution problems being straightforward. Different 
configurations concerning the impact of boundary effects on the convolution operator will be 
studied: first, we will consider the case when no boundary effect occurs. Then, boundary effects 
introduced by zero padding and by a periodic convolution will be taken into account. Finally, the 
special case of decimated convolution will be considered. Q designates in the sequel the length of 
the kernel and (0 q )o<q<Q its values. 



1. One-dimensional convolutive models without boundary effect. 
We typically have the following Toeplitz structure: 











(21) 



where M = N - Q + 1>Q. 

In order to satisfy Assumption 13.51 we can choose I = Q and, for every i £ {!,...,/}, 



I; 



{me{l,...,M} I ( 



m 



mo 



d/ = 0}. 



(22) 
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Hence, we have for all i E {1, . . . , /}, 



g=0 



(23) 



In this case, g can be decomposed as a sum of Q functions, whose proximity operators can 
be easily calculated. 

2. One-dimensional zero-padded convolutive models. 
The following Tceplitz matrix is considered: 

9o i 



■M. 











1Q-X 



■■ 

h 9o. 



(24) 



where M = N > 2Q. In this case, I can be chosen equal to Q and the index sets (Ii)i<i<i 
are still given by (|22p . However, the diagonal parameters are not all equal as in the previous 
example. We have indeed, for every i S {1, . . . , I}, 



A<,2 = 



q=(l 
= Ai,M, 



E 



(25) 



q=0 



3. One-dimensional periodic convolutive models. 

In this case, a matrix having a circulant structure [M] is involved: 



''M. 















0Q-1 



(26) 



where M = N > Q. In order to satisfy Assumption 13. 5| we subsequently set I = min{i > 
Q | (M — i) mod Q = 0} and, for every i G {1, . . . , /}, 



li = < 



{^} 

{me{Q,...,M} 



if t < Q - 1 
(m — i) mod Q = 0} 
otherwise. 



(27) 
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The diagonal parameters are then given by ()23p . 

Another choice which was made in [T] is to set I = min{i > Q \ M mod i = 0} and to 
proceed as in (|22p and (|23p . This solution may be preferred due to its simplicity, when the 
resulting value of I is small. 

4. One-dimensional (i-decimated zero-padded convolutive models. 
We get the following matrix of M MxAr where N = Md > 2Q. 



Vd-1 



#0 
h-l ■ ■ • 



9u 



(28) 



La/ J 



1Q-1 



Od- 



in order to satisfy Assumption I3.5I we subsequently set / = and the index sets (Ij)i<j</ 
are still given by ([22]) . We have indeed, for every i £ {1, . . . , /}, 




mm(id,Q) — 1 i a |2 



Emini 
q=0 



q=0 \ w q\ 



&i,Mi - Y,q=0 \9q\ 2 - 

Note that, when d> Q, (im) m e{i,...,M} ^ s an orthogonal family, and thus 1 = 1. 



(29) 



Remark 3.7 In the previous example (the non-decimated example being a special case when 
d = 1), the computational complexity of applying each operator Tj or with i £ {1, ...,/} is 
O(Md) and we have about Q/d proximity operators prox-^oTi to compute. Assuming a complexity 
O(Mj) for computing prox D . Ti , the overall computational complexity is 0(M(2Q + 1)). In turn, 
if we choose I = M, the complexity of computation of Tj or Tj is 0(Q), but we have about M 
proximity operators prox TjOTi to compute. Thus, the overall computational complexity remains of 
the same order as previously. This means that limiting the number of proximity operators to be 
computed has no clear advantage in terms of computational complexity, but it allows us to reduce 
the memory requirement (gain of a factor Md/Q for the storage of the results of the proximity 
operators). 



3.3 Discrete forms of total variation and associated proximity operator 

Total variation |55| represents a powerful regularity measure in image restoration for recovering 
piecewise homogeneous areas with sharp edges p2H [57j [HJ ISHJ [60] . Different versions of discretized 
total variation can be found in the literature [551 EH ES] . Our objective here is to consider discrete 
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versions for which the proximity operators can be easily computed. The main idea will be to split 
the total variation term in a sum of functions the proximity operators of which have a closed form. 
The considered form of the total variation of a digital image y = (yni,n 2 )o<ni<7Vi,o<n 2 <A r 2 ^ ^ xN2 
is 

iVi-Pi N2-P2 

MV) = E / 9 tv((yh)n 1 ,n 2 ,(^)n 1 ,n 2 ), (30) 

ni=0 n 2 =0 

where pt v G To(M 2 ), and yh and y v are two discrete gradients computed in orthogonal directions 
through FIR filters with impulse responses of size Pi x P 2 . More precisely, in the above expression, 
we have 

{(yh)ni,ri2 = tr(i? ^ni,n 2 ) 
(yv)ni,ri2 = tr(V ^n!,n 2 ) 

where H G M PlxP2 and V G M PlxP2 are the filter kernel matrices here assumed to have 
unit Frobenius norm, and for every (ni,n 2 ) G {0, ...,N\ — -Pi} x {0, ...,N2 — P2}, ^ni,n 2 = 
(yni+pi,n 2 +p 2 )o<pi<Pi,o<p 2 <P 2 denotes a block of Pi x P 2 neighbouring pixels. Since the proximity 
operator associated with the so-defined total variation does not take a simple expression in general, 
(|3tjp can be split in "block terms" by following an approach similar to that in Section 13.21 

A-IP2-1 

(Vy G R N ^ N >) tv(y) = ^ E tv PuvM (32) 

Pl=0 p 2 =0 

where, for every p\ G {0, . . . , Pi — 1} and p% € {0, . . . , P 2 — 1}, 

L i^lj_i L toj_i 

tv PliP2 (y)= £ £ Ptv((y ft )K,(y v K 2 J ( 33 ) 

rti=0 n 2 =0 

and the notation (^ni'^S = (^p^+p^p^+p;, has been used. A closed form expression for the 
proximity operator of the latter function can be derived as shown below (the proof is provided in 
Appendix [8]). 

Proposition 3.8 Under the assumption that tx(HV T ) = 0, for every 

V = (yn 1 ,n 2 )o<n 1 <N l ,0<n2<N2 £ R NlxN ' 2 

and fi > 0, we have 

(V(f>i,P2) G {0, . . . ,Pi - 1} x {0, . . . ,P 2 - 1}) prox MtVpi pa y = (ir ni , n2 ) < ni<Nli o< n2<N2 (34) 
where, for every (m, n 2 ) G {0, . . . , - 1} x {0, ... , - 1}, 

( 7r Pini+p 1 +p' 1 ,P 2 n 2 +p 2 +p 2 )o<pi<Pi,0<p^<P 2 = (/^nl',n 2 ~~ ^ni',n 2 )-^ + ( K nV,rt2 ~~ v n\'^i2^ + ^ni,n2 ( 3 ^) 

<% 2 2 = to(2T T I™g), « = tr(^ T F^) (36) 

iMni ,n 2 J 'Sii ,n 2 — P 1 UJ V p tv v 'ni ,n 2 > (y ni,n 2 A 
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and (V(m, n 2 ) G {0, . . . , N x - 1} x {0, . . . , N 2 - 1}) 



7T„ 



if { 



n\ < p\ or 
n 2 < P2 or 



or 



(38) 



The result in Proposition l3.8l basicallv means that, for a given value of (pi,/J 2 ) £ {0, . . . , Pi — 1} x 
{0, ...,P 2 — 1}, the image is decomposed into non-overlapping blocks YRl^ = 
(yPini+ P i+p' 1 ,P2n 2 +P2+p' 2 )o<p' 1 <Pi,o<p' 2 <P 2 of p i x p 2 pixels. Eq. (J35J) then provides the expres- 
sion of the proximity operator associated with each one of these blocks, whereas ([38]) deals with 
boundary effects. 

Remark 3.9 The above result offers some degrees of freedom in the definition of the discretized 
total variation for the choices of the function p tv and of the gradient filters. 



Two classical choices for the function p tv |55| are the following: 



1. If ptv '■ (rji, 772) ^ \ f]i\ + |f?2| then, an anisotropic form is obtained. According to Example 
(|371) reduces to 



aPi 
Pni 



P1,P2 
712 



si 



gn(/4i 



P2 

■ r>2 



max 



P2 I 

m,n2 1 



M,0) 



Kn\% = sign(^ 1 1 ^ 2 2 )max(|^ 1 1 '^ 2 2 | - p,0). 



(39) 



2. If p tv : (771,772) h-> y 7 (771) 2 + (77 2 ) 2 , then the standard isotropic form is found. The prox- 
imity operator involved in (|37p is given in Example! 



• Some examples of kernel matrices H and V satisfying the assumptions of Proposition 13.81 are 
as follows: 



1. Roberts filters such that H 
gated in 



-1/V2 

1/V2 



and V 



-17V2 
1/V2 



were investi- 



2. Finite difference filters can be used, which are such that H = V T 




-1/V2 1/^/2 




3. Prewitt filters also satisfy the required assumptions. They are defined by 

"-1/V6 1/V6" 

if = F T = -1/V6 1/V6 

-1/V6 1/V6 

4. Sobel filters such that 

-1/V12 1/VT2 

H = V 1 = -2/VT2 2/V12 

— 1/VI2 1/VT2 



are possible choices too. 
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4 Proposed algorithm 



In the class of convex optimization methods, an algorithm recently proposed in [35] appears well- 
suited to solve the class of the minimization problems formulated as in Problem ([5]). However, 
when synthesis frame representations are considered (Problem ([6])) and when the function number 
S is large, the frame analysis and synthesis operators have to be applied several times in the 
algorithm which induces a long computation time. In this section, we briefly recall the Parallel 
ProXimal Algorithm and its convergence properties. Then, we propose an improved version of 
PPXA to efficiently solve Problem ([6]). 

4.1 Parallel ProXimal Algorithm (PPXA) 

An equivalent formulation of the convex optimization problem (|5|) is: 

J 

Find x£ Argmin (40) 
xieR K ,...,xjem K j=\ 

X=X\ = ...=XJ 

This formulation was used in [35j to derive Algorithm [TJ 

Algorithm 1 General form of PPXA 
Set 7 G ]0,+oo[. 

For every j £ {1, . . . , J}, set {^j)i<j<j G]0, 1] J such that Ylj=i u j = !• 

Set (uj j0 )i<j<j 6 (R K ) J and x = Y^j=i w i n i,o- 
For £ = 0,1,... 
For j = 1,..., J 

L Pj,i = P rox 7/ 3 M%^ + 

Vl = E/=i 
Set X e G ]0,2[ 
For j = 1, . . . , J 

L = ^ + ^ ( 2 Pi~X£- pj,e) 

x i+1 = X £ + \ e (pi - xe) 



PPXA involves real constants 7 and (oOj)i<j<j, and, at each iteration £ £ N, a relaxation 
parameter A^. It also includes possible error terms ip,j^)\<j<j in the computation of the proximity 
operators, which shows the numerical stability of the algorithm. The sequence (xe)e>i generated by 
Algorithm [T] can be shown to converge to a solution to Problem (|40p (or equivalently to Problem ([5])) 
under the following assumption [35J. 

Assumption 4.1 

1. lim|j x .|i^ +00 fi(x) + . . . + f.j{x) = +00. 
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2. n/ =1 rint dom fj ^ 0E 

3. (Vj G {1, . . . , J}) £ £eN ^ IK/II < +°°- 
4- ZfeN ^ ( 2 ~ ^) = +°°- 

Remark 4.2 The fact that the algorithm involves several parameters should not be viewed as a 
weakness since the convergence is guaranteed for any choice of these parameters under the previous 
assumption. These parameters bring out flexibility in PPXA in the sense that an appropriate choice 
of them (typical values will be indicated in Section [5T2j) may be beneficial to the convergence speed. 

Consider now Problem ([6]) where a tight frame is employed (i ?T F = vld). By setting (Vj G 
{1, . . . , S}) fj = gjoF T and by invoking Proposition 13. 41 with L = F T and D = uld, the iterations 
of Algorithm Q] become as described in Algorithm [2j 

Algorithm 2 PPXA iterations for Problem ([6]) 
For £ = 0,1,... 
For j = 1, ...,£ 

L PjA = u 3,£+ 

±F(pTOX u ^ g . /0J .( FT Uj,t) - F T Uj/ ) + aj)i 
For j = S + 1,..., J 
[ Pj,i = Wox~ / f J /u 1] u j , e + a j/ 

Pi = £/=i WjPjA 
Set X e G ]0,2[ 
For j = 1, . . . ,J 

I u j,e+i = uj,£ + Xe (2 p £ - x e - p j} e) 
_ x t+1 = x £ + X t (p t - x t ) 



However, the first loop can be costly in terms of computational complexity because it requires 
to apply S times the operators F and F T at each iteration. We will now see how it is possible to 
speed up these iterations. 

4.2 Accelerated version of PPXA 

In Algorithm [3l we propose an adaptation of PPXA in order to reduce its computational load 
by limiting the number of times the operators F and F T are applied. Details concerning the 
derivation of this algorithm can be found in Appendix [9l 

Let us make the following assumption: 
Assumption 4.3 

2 The relative interior of a set 5 of M. x is designated by rint S and the domain of a function / : R x — >•] — oo, +00] 
is dom / = {x G R x \f(x) < +00}. 
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Algorithm 3 Accelerated PPXA 
Let 7 G ]0, +oo[. 

For every j G {1, . . . , J}, set (u)j)\<j<j G]0, 1] j such that Ylj=i u j = 1- 
Set (%o)i<j<j 6 (R^) J and x = Z)/=i w i%,0- 

For every j G {1, . . . , S 1 }, set u^q = F T u jfi and = u j)0 - j;Fv jt o- 
For ^ = 0, 1, . . . 
For j = 1,...,S 

For j = 5 + 1,..., J 

[ Pj,e = l'i'-'N-r _.,".;.» + 

P< = Ef=i 

+^ Ej=i ^ilht + £/=s+i w « 
ri = 2p e - x £ ; m = F T ri; rf = r £ - ~Fri 

Set Xt G ]0, 2[ 

For j = 

u^+i = uf^ + X e (rj- - ufj 

|_ Vjj+i = Vjj + A^ (re - uq^g) 
For j = S + 1,...,J 

I Uj,£+i = Uj,e + (re - Pj,e) 
x e+ i = xe + X e (pe - xi) 



1. lim|| x .| K+00 gi(F T x) + ... + g s (F T x) + /s+i(x) + . . . + fj(x) = +oo. 

2. ( n| =1 rint dom o F T )) f| ( n/ =5+1 rint dom /_,•) / 0. 

3. (Vj G {1, . . . , 5}) ^ |6N A^ lloj^H < +oo and (Vj G {S 1 + 1, . . . , J}) J2ieN ^ IK,dl < +°°- 
4 - EfeN ^ ( 2 ~ ^) = +°°- 

Then, Algorithm [3] converges to a solution to Problem ([6]). In addition, this algorithm requires 
only 3 applications of F or F T at each iteration. Hence, a gain w.r.t. Algorithm [2] is obtained as 
soon as S > 2. This fact will be illustrated by our simulation results in Section 15.2.11 



5 Application to restoration 
5.1 Hybrid regularization 

In restoration problems, one of the terms in the criterion to be minimized usually is a fidelity term 
measuring some distance between the image degraded by the operator T and the observed data 
z. We will assume that this function takes the form j = $oT where VP G To(]R M ). In the case of 
data corrupted by a additive zero-mean white Gaussian noise with variance a, a standard choice 
for VP is a quadratic function such that VP = ■ — z\\ 2 . Then, the associated proximity operator 
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of g can be computed explicitly (see [35]). In the case of data contaminated by an independent 
Poisson noise with scaling parameter a, a standard choice is \P = D^(z,a •) where -Dkl is the 
generalized Kullback-Leibler divergence [531 EH [Ml (MUM H31 EH such that, 



M 



(Vn = (n( m )) 1 < m < M G M A/ ), *(«) = ]T ^ m (u( m )) (41) 



m=l 



and 



VVn(u (m) ) 



\au<- m J J 

if z^ > and u^ 71 ) > 



(42) 

cm (m) if z^ = and r/" 1 ) > 0, 



^ +oo otherwise. 
The proximity operator of can then be derived from Example 13.11 

Concerning regularization functions, a standard choice of penalty function in the wavelet do- 
main is: (Vx = (x^)i<k<K £ M^), &(x) = Ylk=i ^k( x ^) where, for every k E {1, ... ,K}, is a 
finite function of To(M) such that lim| a ,( fc )|_ i>+00 (j)k(x^) = +oo. Power functions as in Example 13.21 
are often chosen for (<j)k)i<k<K (see e.g. [19^ 126]). The main problem with wavelet regularization 
is the occurence of some visual artefacts (e.g. ringing artefacts), some of which can be reduced 
by increasing the redundancy of the representation. Another popular type of regularization that 
can be envisaged consists of employing a total variation measure [55]. Its major drawback is the 
generation of staircase-like effects in the recovered images. To combine the advantages of both 
regularizations, we propose to: 

Find xeArgmin ^(TF T x) + fitv(F T x) + l c {F t x) + $(x). (43) 



As already mentioned, <!> corresponds to the regularization term operating in the wavelet domain, 
tv represents a discrete total variation term as defined by (I30D . Finally, lq is the indicator function 
of a nonempty closed convex set C of R-^ (for example, related to support or value range contraints). 
This kind of objective function was also recently investigated in [35] but the approach was restricted 
to the use of a quadratic data fidelity term and of a specific form of the total variation term. 

The non-negative real parameters •& and ji control the degree of smoothness in the wavelet and 
in the space domains, respectively. 

The main difficulty in applying Algorithm Q] to our restoration problem is that it requires 
to compute the proximity operators associated with each of the four terms in (|43p . In general, 
closed forms of the proximity operators are known only for the indicator function lq and for $ 
[26] . However, as explained in Section 13.21 provided that the function ^ is separable, the data 
fidelity term can be decomposed as a sum of / functions (T^ o Tj)i<j</ for which the proximity 
operators can be calculated according to ([20]) . Similarly, by using the results in Section I3T31 the 
tv function can be split in -P1-P2 functions (t~v Pl! p 2 )o<p 1 <p 1 fi<p 2 <P2i the proximity operators of 
which are given by Proposition 13.81 Algorithm [3] can then be applied with S = I + P1P2 + 1 and 
J = I + P1P2 + 2. In the present case, it can be noticed that if # > 0, Assumption l4.3lT|) is satisfied. 

In addition, Assumption l4.3l2| is fulfilled ( n[ =1 {y G M. N | Tjy £ rint dom T;} ) f| rint (since 
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dom$ = R K and (V(pi,p 2 ) G {0, . . . , Pi - 1} x {0, . . . , P 2 - 1}) domtv piiP2 = K^). This condition 
is verified if ]0,+oo[ M C dom^ and C = [0,255]^ since for every i G {1,...,/}, Tj has been 
assumed non- negative real valued in Section 12.11 and with non-zero lines (see Assumption 13. 5p . 

5.2 Experimental results for convolved data in the presence of Poisson noise 

In our simulations, we will be first interested in studying the performance in terms of convergence 
rate of the accelerated version of PPXA. Algorithms [2] and are implemented by setting 7 = 50, 
Xl = 1.6 and, for every j G {1, . . . , J}, 



if (gj)i<j<i are the functions corresponding to the decomposition of the data fidelity term and 
(9j)i+i<j<i+PiP2 correspond to the decomposition of tv. The weights are thus chosen to provide 
equal contributions to the four functions in Criterion (42) . For the first and second functions which 
are splitted, the corresponding 1/4 weight is further subdivided in a uniform manner. Note however 
that the behaviour of the algorithm did not appear to be very sensitive to an accurate choice of 
these parameters. A comparison between the different total variation regularization terms defined 
in Section [4.21 will also be made. Another discussion will be held concerning the boundary effects. 
Two cases will be considered: the use of a periodic convolution and then, of a convolution with 
zero-padding. Results for a decimated convolution will also be presented. Finally, the interest in 
combining total variation and wavelet regularization terms will be shown with respect to classical 
regularizations. A tight frame version of the dual-tree transform (DTT) proposed in [T8J (y = 2) 
using Symlets of length 6 over 3 resolution levels is employed. We choose potential functions of the 
form: for every k € {1, ... , K}, (f)k = Xk\ ■ \ Pk where Xk > and pk G {1, 4/3, 3/2, 2}, the proximity 
operators of which are given by Example 13.21 

5.2.1 Convergence rate comparison between PPXA and its accelerated version 

Table Q] gives the iteration numbers and the CPU times for the original PPXA algorithm and 
the proposed accelerated one in order to reach convergence when considering different image sizes 
("Sebal": N = 128 x 128, "Peppers": N = 256 x 256 and "Marseille": N = 512 x 512) and various 
kernel blur sizes. The stopping criterion is based on the relative error between the objective 
function computed at the current iteration and at the previous oneH The stopping tolerance has 
been set to 10 -3 . These results have been obtained with an Intel Core2 6700, 2.66 GHz. The last 
line of Table [1] illustrates the gain in CPU-time when using Algorithm O Moreover, in Figure [H 
the mean square error on the image iterates ||i ?T (x n — x)\\ 2 is plotted as a function of computation 
time, where (x n ) n> o denotes the sequence generated by Algorithm [2] or Algorithm [3j 

It can be noticed that the larger the kernel blur size is, the higher the gain is. This is due to 
the fact that the number of proximity operators to compute increases with the kernel size. 

3 The relative error was evaluated based on Criterion (|43[) where the indicator function was discarded. 



41 

1 



if 1 < j < I 

if / + 1 < j < I + P1P2 

otherwise 



(44) 



4PiP 2 
1 

k 4 
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Image size 


128 x 128 


256 x 256 


512 x 512 


(uniform) blur size 


3x3 


7x7 


3x3 


7x7 


3x3 


7x7 


Iteration numbers 


30 


50 


41 


50 


50 


50 


CPU time (in second) 


117.2 


633.0 


411.7 


1298 


1458 


4514 


CPU time - accelerated version (in second) 


13.53 


29.82 


60.59 


89.48 


263.6 


405.0 


Gain 


8.67 


21.2 


6.79 


14.5 


5.53 


11.1 



Table 1: Comparisons between PPXA and its accelerated version. 



4 
3.5 

3 
2.5 

2 
1.5 

1 

0.5 





Figure 1: Convergence profiles of the Algorithm [2] (dotted line) and Algorithm [3] (solid line) versus 
computation time in seconds for a 3 x 3 uniform blur (left) and a 7 x 7 uniform blur (right) and a 
128 x 128-image. 
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5.2.2 Comparison between different forms of total variation 

In Section \3^3\ we have introduced the proximity operator associated with discretized total variation 
functions and the possibility of choosing various filters has been mentioned. In Figure [2j tests have 
been carried out on "Peppers" degraded by a 3 x 3 uniform blur and corrupted by Poisson noise 
with scaling parameter a = 0.1. We compare the restored images for different kinds of total 
variation, in terms of Signal to Noise Ratio - SNR and structural similarity measure - SSIM [65J . 
The SSIM takes a value between -1 to 1, the maximum value being obtained for two identical 
images. Each curve represents the resulting SNR and SSIM versus p (the regularization parameter 
related to the total variation), for a given form of tv (i.e. a given filter associated with either an 
isotropic or anisotropic function ptv)- A small wavelet regularization parameter $ = 10 -3 has been 
chosen in order to better illustrate the influence of the different tv forms on restoration quality. 




Figure 2: SNR (left) and SSIM (right) for different total variation terms with respect to p. Roberts: 
thin-black line, Finite difference: thick-black, Prewitt: thick-gray, pt v = \/\ • | 2 + | • | 2 : solid line 
and p tv = | • | + | • | : dashed line 

It can be concluded from Figure [2] that the choice of the gradient filters and of the form 
(isotropic /anisotropic) of pt v has a significant influence on the restoration quality when the wavelet 
regularization is small. However, we also noticed in our numerical experiments that when the 
wavelet regularization parameter -d becomes larger, the choice of the tv form has a lower influence 
on the restoration quality provided that the regularization parameters are appropriately chosen. 

5.2.3 Boundary effects on restored images 

This section illustrates the influence of boundary effect processing. More precisely, we degraded 
an extended version of "Boat" image by a 7 x 7 uniform blur, and the resulting blurred image was 
cropped to create an image of size 256 x 256. As a consequence, the boundary values are functions of 
pixel locations which are no longer present in the blurred image. The scaling parameter associated 
with Poisson noise is a = 0.5. The objective is then to restore the image (which was centered) by 
using one of the convolution models discussed in Section 13.21 namely either a periodic convolution 
or a convolution with zero-padding. Visual and quantitative results are given in Figure [3j 
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SNR = 16.9 dB - SSIM = 0.62 SNR = 17.7 dB - SSIM = 0.64 
Figure 3: Periodic (left) and zero-padded (right) restoration. 

As it can be noticed from this figure, the periodic convolution model introduces significant 
boundary artefacts unlike the convolution with zero-padding. The results obtained when consider- 
ing "Peppers" led to the same conclusion. For "Sebal" , zero-padding or periodic models provided 
similar results. 

5.2.4 Decimated convolution 

We now present experimental results for a 256 x 256 SPOT image degraded by a uniform decimated 
blur with a kernel size Q = 3 x 3 and a decimated factor d = 2. The scaling parameter of the 
Poisson noise is equal to a = 1. Due to the structure of the degradation operator, the data fidelity 
is splitted in a sum of I = 4 functions. The results are presented in Figure H] where the good 
behaviour of the model can be observed. 




Original Degraded Restored ($ = l,/i = 10 3 ) 

SNR = 16.1 - SSIM = 0.79 



Figure 4: Restoration results for "Spot" image. 
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5.2.5 Influence of each regularization term 

We now present numerical and visual results for the different kinds of regularization when a 
generalized Kullback-Leibler divergence is used as a data fidelity term. This experiment allows 
us to compare the hybrid regularization with existing approaches based on a wavelet-frame |44j 
regularization or a total variation regularization [33~t 44 . The latter regularized solutions can be 
computed either by using augmented Lagrangian techniques |43[ 144] or with our splitting approach 
(by setting $ = or fj, = 0). In our experiments, the computation time of the two approaches 
was observed to be similar. Note that comparisons performed in [33] led to the conclusion that 
the wavelet-frame regularization is quite competitive with respect to other existing restoration 
methods [38j. 

In the images displayed in Figures [6] and [7J one can observe the artefacts related to the 
wavelet regularization, the staircase effects which are typical of the total variation penalization, 
some checkerboard patterns resulting of the chosen gradient discretization, and also the benefits 
which can be drawn from the use of a hybrid regularization. 

Similarly to [44], the values of \i and i? have been adjusted so as to maximize the SNR. 
Optimizing the hyperparameters manually as we did is a common practice in imaging applications, 
especially when a data set of test images having similar characteristics as the one to be restored 
(medical images, satellite images,...) is available. Automatic methods for the optimization of the 
hyperparameters can also be found in the literature such as cross-validation [66J, Stochastic EM 
[67] . MCMC [BSJES] or Stein-based methods [70]. These automatic procedures often are relatively 
intensive. They will not be addressed in this paper due to the lack of space. 



6 Conclusion 



A new convex regularization approach to restore data degraded by a (possibly decimated) con- 
volution operator and a non necessarily additive noise has been proposed. The main advantages 
of the method are (i) to deal directly with the "true" noise likelihood (i.e. the Kullback-Leibler 
divergence in the case of Poisson noise) without requiring any approximation of it; (ii) to permit 
the use of sophisticated regularization functions, e.g. one promoting sparsity in a wavelet frame 
domain and a total variation penalization. In addition, the proposed algorithm has a parallel 
structure which makes it easily implementable on multicore architectures. Numerical and visual 
results demonstrate the effectiveness of the proposed approach. One can note that, even if the 
paper is devoted to the case of convolutive operators, this approach could be generalized to more 
general linear operators. 

Note that the primal-dual approaches [Til [721 E31 E31 [75] can offer alternative solutions to the 
ones developed in this paper. However, one of the advantages of PPXA is that it easily leads to 
efficient parallel implementations 
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Degraded, a = 0.1 and uniform blur 3x3 Total variation regularization 

SNR = 8.88 dB - SSIM = 0.69 SNR = 11.2 dB - SSIM = 0.79 




Hybrid regularization (i? = 0.09,^ = 0.006) Wavelet-frame regularization 
SNR= 12.4 dB - SSIM= 0.85 SNR= 11.7 dB - SSIM= 0.83 



Figure 5: Restoration results for "Sebal" image. 
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Degraded, a = 0.1 and uniform blur 3x3 
SNR = 11.2 dB - SSIM = 0.27 




Hybrid regularization (d = 0.06,^ = 0.011) 
SNR=18.8 dB - SSIM=0.67 




Total variation regularization 
SNR = 17.8 dB - SSIM = 0.60 




Wavelet-frame regularization 
SNR=18.0 dB - SSIM= 0.62 



Figure 6: Restoration results for "Boat" image. 
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Degraded, a = 0.1 and uniform blur 3x3 Total variation regularization 

SNR = 11.4 dB - SSIM = 0.16 SNR = 22.1 dB - SSIM = 0.69 




Hybrid regularization (t? = 0.2, fi = 0.006) Wavelet-frame regularization 
SNR = 22.7 dB - SSIM = 0.74 SNR= 21.4 dB - SSIM = 0.68 



Figure 7: Restoration results for "Peppers" image. 
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7 Proof of Proposition 13.41 



Since D = LL T is the matrix associated with a bijective operator, L is associated with a surjective 
one and domT ^ dom (T o L) ^ 0. This allows us to conclude that h = T o L is a function 

of r (m x ). 

To calculate the proximity operator of h, we now come back to the definition of this operator. We 
have thus, for every w G WL X , 

prox^u; = arg min —\\v — w\\ 2 + T(Lv). (45) 

vm x 2 

We can write any vector v G M. x as a sum of an element L T t G ran L T and wj_ G (ran L T ) _L = ker L. 
We have then Lt; = LL T t = Dt. Similarly, we can write w = L T u + w± where u G M y and 
G kerL. So, prox^u; can be determined by finding 

min -\\L T t + v± - L T u - w± II 2 + T(Dt) 

{t,v ± )£R Y xR* 2 

= min J-\\L T (t-u)\\ 2 + hv ± -w ± \\ 2 + T(Dt). (46) 

(t,v ± )m Y xR x 2 2 

This yields 

= w± = w — L T u (47) 

and it remains to find 

min -\\L T (t - u)\\ 2 + T(Dt) = min -It - u) T D(t - u) + T(Dt). (48) 
t£R Y 2 tm Y 2 

By using the separability of T, this is equivalent to finding 

Y x 

min S~] -A m (o m T t - o m T u) 2 + Vm(A m o m T t). (49) 

t£R Y ^— ' 2 

m=l 

It can be deduced from |23|, Lemma 2.6] that, for every m G {1, . . . , Y}, 

o m T t = prox_i_^ m(Am . ) ( 0m T u) 



1 



= — prox Aml/)m (A m o m u), (50) 
which, according to [261 Proposition 2.10], leads to 

Y 

t = D' 1 ^ P r0X A ra ^( A m m T ") O m 
m=l 

= J D" 1 prox jDT (L>u). (51) 

Altogether, (gTD and (JSTJ) yield 

f = it; + L T (^D^ 1 piox DT (Du) — u). 

In addition, since L T u is the projection of w onto ranL T , u = (LL T )~ 1 Lw = D^ 1 Lw and (|17|) 
follows. 
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8 Proof of Proposition 13.81 



By using the proximity operator definition (J7|), 



7T = prox MVpi on (y) 



"P1,P2 

minimizes 

1 „ ll2 , , 

-|K-y|| +^tv pi>P2 (7r) 

= 2 ( 7r ' l i> n 2 — Vnitm) 

(ni,n 2 )eB 

L ^-Wj_ 1L taj_a 



E E {^rea-^,!SllF + /*Ptv(M^ T ^),tr(^ T ^))} (52) 

ni=0 ri2=0 

where ||-||f denotes the Frobenius norm, for every (m, n-z) G {0, . . . , [ ^p^ \— l}x{0, . . . , [ ^^P 2 j — 
1}, 

(53) 



and 

B = {(ni,n 2 ) G N 2 | < n x < p 1 or < n 2 < p 2 
orAL^-^J ^nxK^ 

or p 2 L^_^j <n 2 <iV 2 }. (54) 
"2 

It is then clear that (|38p holds since the variables vr niin2 with (ni,n 2 ) G B are not elements of the 
matrices with m € {0, . . . , L^^-J - 1} and n 2 G {0, . . . , L^^J - !}• In addition, since 

it has been assumed that tr(H T V) = and ||J?||f = ||V||f = 1> the matrices Hn\'^i 2 an d Ynlinl can 
be decomposed in an orthogonal manner as follows: 



TTP1.P2 _ oPl,P2 TT , P1,P2 , /TTP1.P2 \_L 

1 - L ni,n2 — Hni > n2 11 < "mi,n2 * t v 11 ni,n2y 

\rPl,P2 _ UP1,P2 IT , „,P1,P2 T/ | f\rpi,pa\X 

1 ni,ri2 — li ni ) n2 11 T u ni,rt2 v '\ 1 ni,n2/ 



(55) 



where 



QPi,P2 — f r (ff T Tl PuP2 ) 

<\%=HV T U p \%) 1 (56) 

(TXP1,P2 \± — TTP1-P2 _ flPl,P2 JJ _ K P1,P2 y 

\ ri\,ri2) ni,ri2 "m,ri2 ni,n2 ' 

(YPl,P2\±- — VP\,P2 _ UPl,P2 TT _ pi,P2 y (x>7\ 
V- I ni,n2/ ni,n 2 n\,ri2 m,n,2 ' \ I 
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and (/ini'n 2 2 ; ^ni'n.2) is given by (|36p . After some simplications, we have thus to minimize, for every 
n x G {0, .' . . , L%^J - 1} and n 2 G {0, . . . , L^™J - 1}, 



= \\\(K\% 2 2 ) ± - W^lll 

4. _(/=}Pl>P2 _ J,Pl,P2 ^2 
^ 2^ n li n 2 ni,Tl2/ 

1 

_) ( K P1,P2 _ ,,Pl,P2 \2 

^ 2^ n ii n 2 ni,fi2/ 

+ M*vO«><&)- (58) 

This shows that (|37|) is satisfied and that (HnVjriL) = {Yni^) • Eq. ([35|) straightforwardly follows. 

9 Derivation of Algorithm [3] 

Let denote the projector on rani 7 . For every u G R , we have 

u = Il F u + u- L (59) 
where u -1- G (rani 7 ) -1 " is the projection error and there exists g G such that 

n F u = Fq. (60) 
By combining this with the fact that F T w L = 0, we obtain the relation, 

q = ^F T (61) 

which allows us to deduce from ([59]) that 

u L = u- —FF T u. 
v 

Now, consider the first step of Algorithm [2} (Vj G {1, . . . , S}) 

F -r -r 

Pj,£ = Uj,£ + — (prox^y^^ Utf) - F u j>£ ) + a ji£ (62) 

where a^i is assumed to belong to r&nF, i.e. a,j£ = Faj^ with aj/ G R . Defining qj^ G R~^ 
similarly to ([60]) yields Iljrp^ = i 7 ^. According to (f6T|) . is such that 



By combining (|62|) and 

Qj,£ = -WOXvygj/ujivM) where V j,i = F ^ U j/' ( 64 ) 
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Moreover, since p 3 ^ = Fq^e + pj-g, the computation of the variable pe = Ylj=i ^jPj/ m Algorithm 
[2] can be rewritten as 

S S J 

Pi = Fj2^me + J2^ p ii + Yl ( 65 ) 

j=l 3=1 j=S+l 

where, according to ([59]), ([60D , ([621) and (1531) . 

pf,i = u j,i ~ \ FF ~ Xu 3\i = u f,e- ( 66 ) 

In the new formulation, the last steps of the algorithm consist of updating uj~ e and Vj t £, for all 
j G {1, . . . , 5}. We propose to define rg = 2p£ — xe , re = F T re and rf = re — jjFfg, which yields 
Vj,e + i = v jt £ + \e(re - F T pj^e) and u^ e+1 = uf >t + A^(r^ - pj t ). By using flMJ) and (J66J) , these 
relations can be simplified as 

Vj,e + i = Vjj + Xe ire, - vq ji£ ) 
< and (67) 
k uf /+1 = ufo + Xe {if - uf t ) , 

which leads to Algorithm [3l 

We finally note that Assumption I4.3l l3|) implies that Assumption 14, 1| |3|) is satisfied since, for every 
je{l,...,S}, 

^ \\a j: e\\ = Xe \\Fa jit \\ < \\F\\ X e \\aj,t\\ < +oo. (68) 
£sn een em 

This allows us to transpose the convergence results concerning Algorithm [2] to Algorithm [3l 
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